mysql delete语句使用where in不走索引的问题解决
1、背景介绍:
想要删除某个表中重复行的数据
先使用聚合函数查询到哪些数据是重复的:select update_time from test_data group by update_time having count(1) > 1;
2、根据update_time数据相同的值找到对应的id,根据主键id进行删除:delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t);
然后查看这个sql的执行计划
explain delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t);
1 | root@db 12:16: [tmp_opa_user]> explain delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t); |
这种方法没有走主键索引,删除耗时:
1 | root@db 12:27: [tmp_opa_user]> delete from test_data where id in (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t); |
3、更改delete删除方法,使用delete inner join的方法:
sql如下:
explain delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t) d on d.id = a.id;
1
2
3
4
5
6
7
8
9
10explain delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t) d on d.id = a.id;
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1003893 | Using where |
| 1 | PRIMARY | a | eq_ref | PRIMARY | PRIMARY | 8 | d.id | 1 | |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 1003893 | |
| 3 | DERIVED | test_data | index | NULL | idx_update_time | 7 | NULL | 1003893 | Using index |
+------+-------------+------------+--------+---------------+-----------------+---------+------+---------+-------------+
4 rows in set (0.000 sec)
执行计划提示1执行删除的时候使用了索引:
1 | root@db 12:22: [tmp_opa_user]> delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t) d on d.id = a.id; |
删除8万条数据执行了4.1s
4、以上两种方法,3的方法会比2的方法快,尤其是在根据二级索引删除的时候,方法3比方法2提升N个档次
1 | delete from test_data where update_time in (select update_time from (select update_time from test_data group by update_time having count(1) > 1 limit 1000) as t); |
1 | delete a from test_data a inner join (select update_time from (select update_time from test_data group by update_time having count(1) > 1) as t) d on d.update_time = a.update_time; |
感兴趣的可以用这两个sql测试一下性能
5、扩展
如果感觉一次性删除数据量太大,可以使用limit来限制条数,例如:delete a from test_data a inner join (select id from (select min(id) as id,update_time from test_data group by update_time having count(1) > 1 ) as t limit 1000 ) d on d.update_time = a.update_time;